Hybrid buffer management scheme for immutable pages

ABSTRACT

Exemplary embodiments provide a database query processing system comprising a hybrid buffer pool stored in a computer main memory coupled to a processor, the hybrid buffer pool comprising: a shared buffer pool of page frames that contain dirty data pages that are modified and will be written back to storage; and an immutable buffer pool that temporarily contains read-only data pages from the storage, wherein the shared buffer pool is different from the immutable buffer pool. A page multiplexer that identifies which ones of the data pages from storage to store in the immutable buffer pool based at least in part on information from a query processor.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. Provisional patentapplication Ser. No. 14/281,453, filed May 19, 2014, and is incorporatedherein by reference.

BACKGROUND

Most database systems preallocate memory space and manage for bufferingand caching data pages from storage. Modern database buffers and filesystem caches include a pool of page frames which can consume severalgigabytes of memory space. This memory space is used to stage data fromstorage for query processing and to provide fast accesses to data. Forbest performance, these page frames are typically managed by a bufferreplacement algorithm such as least recently used (LRU) scheme in orderto keep data most likely to be accessed in the future in the memory. Inreal environments, numerous processes contend for these page frames, andthe buffer replacement algorithm should be designed sophisticatedly.However, the exemplary embodiments observe that database systems oftencannot saturate the storage device (i.e., SSD) regardless of clientscale. An in-depth performance analysis of database systems reveal thatpage-reads are often delayed by preceding page-writes when there is highconcurrency among reads and writes. This “read blocked by write” problemcan negatively impact CPU utilization.

BRIEF SUMMARY

Exemplary embodiments provide a database query processing systemcomprising a hybrid buffer pool stored in a computer main memory coupledto a processor, the hybrid buffer pool comprising: a shared buffer poolof page frames that contain dirty data pages that are modified and willbe written back to storage; and an immutable buffer pool thattemporarily contains read-only data pages from the storage, wherein theshared buffer pool is different from the immutable buffer pool. A pagemultiplexer that identifies which ones of the data pages from storage tostore in the immutable buffer pool based at least in part on informationfrom a query processor.

BRIEF DESCRIPTION OF SEVERAL VIEWS OF THE DRAWINGS

These and/or other features and utilities of the present generalinventive concept will become apparent and more readily appreciated fromthe following description of the embodiments, taken in conjunction withthe accompanying drawings of which:

FIG. 1 is a block diagram illustrating a database query processingsystem having a hybrid buffer pool having a shared buffer pool and animmutable buffer pool in accordance with the exemplary embodiments.

FIG. 2 is a block diagram illustrating components comprising animmutable buffer pool.

FIG. 3 is a table illustrating conditions that must be satisfied inorder for the page multiplexer to identify data pages that can be storedin the immutable buffer pool.

FIG. 4A is a flow diagram illustrating assignment of a circular pagebuffer to a process by the immutable buffer manager.

FIG. 4B is a flowchart illustrating allocation of an immutable bufferpage frame to a process by the immutable buffer manager.

DETAILED DESCRIPTION

Reference will now be made in detail to the embodiments of the presentgeneral inventive concept, examples of which are illustrated in theaccompanying drawings, wherein like reference numerals refer to the likeelements throughout. The embodiments are described below in order toexplain the present general inventive concept while referring to thefigures.

Advantages and features of the present invention and methods ofaccomplishing the same may be understood more readily by reference tothe following detailed description of embodiments and the accompanyingdrawings. The present general inventive concept may, however, beembodied in many different forms and should not be construed as beinglimited to the embodiments set forth herein. Rather, these embodimentsare provided so that this disclosure will be thorough and complete andwill fully convey the concept of the general inventive concept to thoseskilled in the art, and the present general inventive concept will onlybe defined by the appended claims. In the drawings, the thickness oflayers and regions are exaggerated for clarity.

The use of the terms “a” and “an” and “the” and similar referents in thecontext of describing the invention (especially in the context of thefollowing claims) are to be construed to cover both the singular and theplural, unless otherwise indicated herein or clearly contradicted bycontext. The terms “comprising,” “having,” “including,” and “containing”are to be construed as open-ended terms (i.e., meaning “including, butnot limited to,”) unless otherwise noted.

The term “component” or “module”, as used herein, means, but is notlimited to, a software or hardware component, such as a fieldprogrammable gate array (FPGA) or an application specific integratedcircuit (ASIC), which performs certain tasks. A component or module mayadvantageously be configured to reside in the addressable storage mediumand configured to execute on one or more processors. Thus, a componentor module may include, by way of example, components, such as softwarecomponents, object-oriented software components, class components andtask components, processes, functions, attributes, procedures,subroutines, segments of program code, drivers, firmware, microcode,circuitry, data, databases, data structures, tables, arrays, andvariables. The functionality provided for the components and componentsor modules may be combined into fewer components and components ormodules or further separated into additional components and componentsor modules.

Unless defined otherwise, all technical and scientific terms used hereinhave the same meaning as commonly understood by one of ordinary skill inthe art to which this invention belongs. It is noted that the use of anyand all examples, or exemplary terms provided herein is intended merelyto better illuminate the invention and is not a limitation on the scopeof the invention unless otherwise specified. Further, unless definedotherwise, all terms defined in generally used dictionaries may not beoverly interpreted.

This invention proposes a hybrid buffer management scheme to solve the“read blocked by write” problem, so that database systems can not onlyimprove the CPU utilization in terms of cache hit ratio and effectivecycles but also fully leverage storage performance potentials.

FIG. 1 is a block diagram illustrating a database query processingsystem 10 having a hybrid buffer pool having a shared buffer pool and animmutable buffer pool in accordance with the exemplary embodiments. Thedatabase query processing system 10 may include the client computer 12in communication with the database server 14 over a network (not shown).The database server 14 may include typical computer components includinga processor 16, a memory 18, and high-speed storage 20 (e.g., solidstate drive (SSD)).

In operation, a user or an application on the client computer 12 mayissue queries, e.g., a SQL command, to the database server 14. A queryprocessor 22 executing on the processor 14 receives, parses and executesthe query. Database systems in general, including the database server14, stage data pages 24 from storage 20 into a shared buffer pool 26 inmemory 18 in order to serve the query. Therefore, the database server 14allocates memory space, referred to as page frames 28, to read in anumber of data pages 24 from storage 20. This shared buffer pool 26 is acommon space for data processing. In conventional database systems, theshared buffer pool would contain both clean data pages from storage 20that are read-only, and dirty data pages 30 that are modified after aread that need to be written back to storage 20.

For best performance, the data pages in the shared buffer pool 26 aretypically managed by a buffer replacement algorithm, such as leastrecently used (LRU) scheme in order to keep data most likely to beaccessed in the future in the memory. In the meantime, the databaseserver 14 writes back the dirty data pages 30 to storage 20 in thebackground to free up the buffer space for future data processing.

Regardless of buffer replacement algorithms, conventional databaseservers sometimes faces the situation where dirty data pages 30 need tobe written back to storage 20 in order to allocate a free space in theshared buffer pool 26 when the system workload is high and the sharedbuffer pool 26 is full. As a result, a read operation may be blocked insome systems until a write operation completes even though the write andthe read are independent each other. The exemplary embodiments refer tothis issue as a “read blocked by write” problem. Note that this blockingof a read is different from the situation where a data read is blockeduntil preceding writes to the same data complete in order to preservethe “read after write” consistency semantics.

For example, when MySQL cannot find a page to process in a shared bufferpool, MySQL tries to secure a free page frame 28 before issuing the readrequest to storage. In general, a free page frame 28 can be found in theshared buffer pool if free page frames are collected in advance.However, when write backlogs are very high, no free page frame 28 can befound in the shared buffer pool. In this case, the database servertraverses the LRU list of page frames 28, starting from the oldest pageframe to search a victim page to write back to storage. This causesperformance issues both in the host and storage because all bufferaccesses are serialized on the traverse of LRU list to evict dirty datapages 30 and because write back to storage take a long time.

This symptom is common for conventional commercial database systemsincluding Oracle as well as MySQL InnoDB. This serialization limits notonly the parallelism in database servers but also the IO requests tostorage. However, these performance issues are not noticeable if thestorage comprises a typical hard disk drive (HDD) because I/O mayalready be saturated. In contrast, database servers with highperformance storage such as SSD suffer from low utilization of hostcomputing resources and storage bandwidth.

According to the exemplary embodiments, a hybrid buffer managementscheme is proposed that address the serialization issue of LRU listtraverse for the shared buffer pool 26 and the “read blocked by write”problem. The hybrid buffer management scheme may include an immutablebuffer pool 34 in memory 18, a page multiplexer 36, and an immutablebuffer manager 38 executing on a processor 16. Although not shown, thepage multiplexer 36 may also execute on the processor 16.

Thus, a database hybrid buffer pool is provided comprising both theshared buffer pool 26 containing dirty data pages 30 that have beenmodified and will be written back to storage 20, as well as theimmutable buffer pool 34 that temporarily contains read-only data pages32 from storage 20, i.e., data pages accessed by a read-only query. Forinstance, data pages accessed by a select query without an update areread-only. The shared buffer pool 26 may store conventional long-livedshared data pages, while the immutable buffer pool may store short-livedread-only data pages 32, where a short-lived data page is valid onlyduring the lifespan of a query.

The immutable buffer 34, which has minimal maintenance overhead, avoidsunnecessary delay due to the eviction of dirty data pages 30 from theshared buffer pool 26. According to the exemplary embodiment, theimmutable buffer 34 is a temporary buffer, assigned to a process, whichis local, read-only, short-lived, lock-free, and recyclable. Thisspecial buffer can be used for data pages that satisfy certainconditions and can improve the overall system and storage utilization byavoiding unnecessary delay due to the eviction of dirty data pages 30,while prior long-lived shared buffers (or caches) experience contentionsfor free pages.

The page multiplexer 36 identifies which ones of the data pages 24 fromstorage 20 to store in the immutable buffer pool 34 based at least inpart on information from the query processor 22. In one embodiment, thepage multiplexer 36 receives information from the query processor 22 inthe form of query hints 39. As described in FIG. 3, the query processoranalyzes a given query and gives hints if it is a read query; if it canuse an index; and/or if it accesses pages infrequently.

The hybrid buffer pool is provided for queries against read-only(immutable) pages 32 to avoid the “read blocked by write” problem and toavoid or minimize the issue of the serialization. The hybrid buffer poolhas several advantages since it does not allow rarely accessed pages toenter the shared buffer pool 26. One advantage is that the databaseserver 14 can issue more read requests to storage 20 since the readoperations are not blocked by the write operations. This increases theutilization of the storage 20 and the database server 14, andconsequently improves the throughput of the system and the response timeof each query. Another advantage is that this hybrid buffer pool is alsobeneficial to other processes that need to read data pages into a bufferbecause programs are less-often contending for free pages. Yet a furtheradvantage is that the hybrid buffer scheme can avoid the time consumingbuffer management overhead for cold pages so that CPU time may besignificantly saved.

FIG. 2 is a block diagram illustrating components comprising animmutable buffer pool. According to one embodiment, the immutable bufferpool 34′ can be implemented as a circular array 200 of fixed-sizedcircular page buffers 202, each comprising a plurality of page frames208. In one embodiment, each of the circular page buffers 202 may beidentified using an array index 204, where indices of the array index204 may enumerate the circular page buffers 202. In one embodiment, thearray index 204 (or another data structure) may include an indication ofwhether each of the circular page buffers 202 is used or free.

In one embodiment, the size of circular array 202 should be sufficientlylarge to support the number of concurrent processes 206 that the systemsupports. The size of circular page buffers 202 may be determined byfactors including at least I/O bandwidth and the number of concurrentprocesses.

The immutable buffer pool 34 can be used to store or stage data pages 24that satisfy predetermined conditions. Since the immutable buffer pool34 is a one-time read-only memory space, the immutable buffer pool 34can be recycled at any time after a query lifespan.

FIG. 3 is a table illustrating conditions that must be satisfied inorder for the page multiplexer 36 to identify data pages 24 that can bestored in the immutable buffer pool 34. In one embodiment, the pagemultiplexer 36 may identify data pages that to be stored in the mutablebuffer pool based at least in part on query hints 39 from the queryprocessor. The page multiplexer 36 identifies and stores in theimmutable buffer pool 34 any data pages 24 that satisfy the followingconditions 300:

1) Any data page 24 in the long tail that has no spatial or temporallocality, which is infrequently accessed—pages that satisfy theconditions are typically accessed just once at a time and not accessedafterward for long time, and consequently cache is wasted;

2) Any data page 24 resulting from read queries with weak consistency.An example includes but is not limited to, a READ COMMITTED mode in ANSIisolation level where the query results can be different over timebecause the query is applied to a snapshot of table at a certain time.

3) Any data page that is accessed through a table index for a point orrange query. Examples include, but are not limited to, leaf nodes of anIndex Only Table of a storage engine (e.g., InnoDB for MySQL), andrandomly accessed data pages of heap file of an Oracle database.

In one embodiment, the page multiplexer system does not cache internalindex nodes of the index tree to the immutable buffer pool 34, butrather to the shared buffer pool 26 because internal index nodes havehigh temporal locality.

In one embodiment, the immutable buffer manager 38 is responsible forassigning circular page buffers 202 to requesting processes 206, whichin one embodiment may also include threads. In one embodiment, the pagemultiplexer system may determine if data pages for the process are forthe immutable buffer first, and then the immutable buffer managerassigns circular page buffers and page frames to requesting processes.

FIG. 4A is a flow diagram illustrating assignment of a circular pagebuffer to a process by the immutable buffer manager 38. The process maybegin by the immutable buffer manager 38 receiving notification from thepage multiplexer that a data page 24 for a requesting process 206 hasbeen identified for staging in the immutable buffer pool 34′ (block400).

The immutable buffer manager 48 determines if this will be the firstcircular page buffer 202 assigned this process (block 402). If so, theimmutable buffer manager 48 finds a free circular page buffer 202 in thecircular array 200 (block 406). In one embodiment, the immutable buffermanager 48 finds a free circular page buffer 202 that is under an arraylock to avoid a race condition in case of its multi-threadedimplementation.

If this is not the first circular page buffer 202 assigned to thisprocess 206 (block 402), then immutable buffer manager 48 determines ifthe process 206 is in single buffer mode (block 404). If the process 206is in single buffer mode, then a page frame 28 from the shared bufferpool 26 is assigned to the process 206 (block 412).

If it is determined that this is the first circular page buffer 202assigned the process (block 402) or that the process 206 is not insingle buffer mode (block 404), then immutable page buffer 48 attemptsto find a free circular page buffer 202 in the circular array 200 (block406). If it is determined that a free circular page buffer 202 isavailable (block 408), then the immutable buffer manager 48 assigns thefree circular page buffer 202 to the process (block 410). In oneembodiment, assignment of the free circular page buffer 202 may furtherinclude assigning one of the indices of the array index 204 of a freecircular page buffer 202 to the process 206.

If no free circular page buffer 202 is available, then then the sharedbuffer pool 26 may be used and a page frame 28 from the shared bufferpool 26 is assigned to the process 206 (block 412).

In one embodiment, the immutable buffer manager may assign a page frameto the requesting process from the circular page buffer when themultiplexer requests a page frame for a data page.

FIG. 4B is a flowchart illustrating allocation of a page frame from anassigned circular page buffer to a process by the immutable buffermanager 38. The process may begin by finding a free page frame 208 fromthe circular page buffer 202 assigned to the process 206 (block 450). Ifa free page frame 208 is found (block 452), then the free page frame isassigned to the process 206 (block 454). If a page frame 208 cannot befound (block 452), then it is determined if the process is in singlebuffer mode (block 456). If the process 206 is in single buffer mode,then a page frame 28 from the shared buffer pool 26 is assigned to theprocess 206 (block 458). If the process 206 is not in single buffermode, then another circular page buffer 202 is assigned to the process206 (block 460) and the process continues by finding a free page framein the assigned circular page buffer (block 450). If the process cannotget another circular page buffer, it can retry until a timer expires orthe number of retrials reaches a maximum.

Once a circular page buffer 202 is assigned to a process 206, theprocess 206 has an exclusive access to the circular page buffer 202. Inone embodiment, a single circular page buffer 202 is assigned to aprocess 206 by default, but more than one circular page buffer 202 canbe assigned to the process 206, depending on the system workload.

In one embodiment, the immutable buffer manager can implement afair-sharing mechanism to prevent one process from using up theimmutable buffer. For instance, if the process 206 waits on a page frameof a circular page buffer 206 for a predetermined amount of time becauseall page frames in a circular buffer are already assigned, the immutablebuffer manager 38 can assign additional circular buffers 202.Consequently, other processes can have page frames assigned during thewait time period. In a light loaded condition, the process 206 obtains asufficient number of circular page buffers 202 to avoid starvation forbuffers.

Once any process 206 starts to use the shared buffer pool 26, theimmutable buffer manager 38 may not assign more than one circular pagebuffer 202 to the processes 206 after that. If any process already hasmore than one assigned circular page buffer, the assigned circular pagebuffers 26 may remain assigned until the used circular page buffers arerecycled. In this exemplary embodiment, serialization takes place onlywhen an array index that assigns a circular buffer from the circularbuffer array to a process is determined for a multi-threadedimplementation, which is very short and negligible.

The present invention has been described in accordance with theembodiments shown, and there could be variations to the embodiments, andany variations would be within the spirit and scope of the presentinvention. For example, the exemplary embodiment can be implementedusing hardware, software, a computer readable medium containing programinstructions, or a combination thereof. Software written according tothe present invention is to be either stored in some form ofcomputer-readable medium such as a memory, a hard disk, or a CD/DVD-ROMand is to be executed by a processor. Accordingly, many modificationsmay be made by one of ordinary skill in the art without departing fromthe spirit and scope of the appended claims.

We claim:
 1. A database query processing system, comprising: a hybridbuffer pool stored in a computer main memory coupled to a processor, thehybrid buffer pool comprising: a shared buffer pool of page frames thatcontain dirty data pages that are modified and will be written back tostorage; and an immutable buffer pool that temporarily containsread-only data pages from the storage, wherein the shared buffer pool isdifferent from the immutable buffer pool; and a page multiplexer thatidentifies which ones of the data pages from storage to store in theimmutable buffer pool based at least in part on information from a queryprocessor.
 2. The database query processing system of claim 1, whereinthe immutable buffer pool comprises a circular array of fixed-sizedcircular page buffers, each comprising a plurality of page frames. 3.The database query processing system of claim 2, wherein each of thecircular page buffers is identified using an array index, where indicesof the array index enumerate the circular page buffers.
 4. The databasequery processing system of claim 3, wherein the array index includes anindication of whether each of the circular page buffers is used or free.5. The database query processing system of claim 2, further comprisingan immutable buffer manager that assigns at least one of the circularpage buffers to a requesting process.
 6. The database query processingsystem of claim 5, wherein assignment of the at least one circular pagebuffer comprises: responsive to determining that this is the firstcircular page buffer assigned to the process, assigning a free circularpage buffer to the process; and responsive to determining that no freecircular page buffer is available, assigning a page frame from theshared buffer pool to the process; and responsive to determining this isnot the first circular page buffer assigned to the process, determiningif the process is in single buffer mode, and if so, assigning a pageframe from the shared buffer pool to the process.
 7. The database queryprocessing system of claim 5, further comprising allocation of at leastone page frame from a circular page buffer assigned to the process by:responsive to finding a free page frame from the circular page buffer,assigning the free page frame to the process; responsive to not findinga free page frame, determining if the process is in single buffer mode,and if so, assigning a page frame from the shared buffer pool to theprocess; and responsive to determining that the process is not in singlebuffer mode, assigning another circular page buffer to the process. 8.The database query processing system of claim 1, wherein conditions thatmust be satisfied in order for the page multiplexer to identify the datapages for storage in the immutable buffer pool comprise: any data pagein the long tail that has no spatial or temporal locality; any data pageresulting from read queries with weak consistency; and any data pagethat is accessed through a table index for a point or range query. 9.The database query processing system of claim 1, wherein the immutablebuffer pool is a one-time read-only memory space that is recycled at anytime after a query lifespan.
 10. The database query processing system ofclaim 1, wherein the shared buffer pool stores long-lived shared datapages, and the immutable buffer pool stores short-lived read-only datapages that avoids delay due to eviction of dirty data pages from theshared buffer pool.
 11. The database query processing system of claim 1,wherein the page multiplexer receives information from the queryprocessor in the form of query hints.
 12. A database server havingaccess to a storage of data pages, a processor; a main memory coupled tothe processor, the main memory including a hybrid buffer pool,comprising: a shared buffer pool of page frames that contains dirty datapages that are modified and will be written back to storage; and animmutable buffer pool that temporarily contains read-only data pagesfrom the storage, wherein the shared buffer pool is different from theimmutable buffer pool; and a page multiplexer executed by the processorthat identifies which ones of the data pages from storage to store inthe immutable buffer pool based at least in part on information from aquery processor.
 13. The database server of claim 11, wherein theimmutable buffer pool comprises a circular array of fixed-sized circularpage buffers, each comprising a plurality of page frames.
 14. Thedatabase server of claim 13, wherein each of the circular page buffersis identified using an array index, where indices of the array indexenumerate the circular page buffers.
 15. The database server of claim14, wherein the array index includes an indication of whether each ofthe circular page buffers is used or free.
 16. The database server ofclaim 13, further comprising an immutable buffer manager executed by theprocessor that assigns at least one of the circular page buffers to arequesting process.
 17. The database server of claim 16, whereinassignment of the at least one circular page buffer comprises:responsive to determining that this is the first circular page bufferassigned to the process, assigning a free circular page buffer to theprocess; and responsive to determining that no free circular page bufferis available, assigning a page frame from the shared buffer pool to theprocess; and responsive to determining this is not the first circularpage buffer assigned to the process, determining if the process is insingle buffer mode, and if so, assigning a page frame from the sharedbuffer pool to the process.
 18. The database server of claim 16, furthercomprising allocation of at least one page frame from a circular pagebuffer assigned to the process by: responsive to finding a free pageframe from the circular page buffer, assigning the free page frame tothe process; responsive to not finding a free page frame, determining ifthe process is in single buffer mode, and if so, assigning a page framefrom the shared buffer pool to the process; and responsive todetermining that the process is not in single buffer mode, assigninganother circular page buffer to the process.
 19. The database server ofclaim 12, wherein conditions that must be satisfied in order for thepage multiplexer to identify the data pages for storage in the immutablebuffer pool comprise: any data page in the long tail that has no spatialor temporal locality; any data page resulting from read queries withweak consistency; and any data page that is accessed through a tableindex for a point or range query.
 20. The database server of claim 12,wherein the immutable buffer pool is a one-time read-only memory spacethat is recycled at any time after a query lifespan.
 21. The databaseserver of claim 12, wherein the shared buffer pool stores long-livedshared data pages, and the immutable buffer pool stores short-livedread-only data pages that avoids delay due to eviction of dirty datapages from the shared buffer pool.
 22. The database server of claim 12,wherein the page multiplexer receives information from the queryprocessor in the form of query hints.